8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation #26171

pengxiaolong · 2025-07-07T19:56:30Z

Shenandoah always allocates memory with heap lock, we have observed the heavy heap lock contention on memory allocation path, this change is to propose an optimization for the code path for mutator memory allocation to improve heap lock contention.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171
$ git checkout pull/26171

Update a local copy of the PR:
$ git checkout pull/26171
$ git pull https://git.openjdk.org/jdk.git pull/26171/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26171

View PR using the GUI difftool:
$ git pr show -t 26171

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26171.diff

… allocation

This reverts commit 66f3919.

… to allocate object under lock from the non-empty region with enough capacity

bridgekeeper · 2025-07-07T19:57:19Z

👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-07-07T19:58:20Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-07-07T19:58:48Z

@pengxiaolong The following labels will be automatically applied to this pull request:

hotspot-gc
shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

…CAS allocation

…with CAS

…allocation req

kdnilsen

Thanks for doing this work. It is huge progress. I've left a number of comments. I didn't have time to study/comment on all of the code, as I am out-of-office for rest of this week. I'll look more when I return.

kdnilsen · 2025-07-08T17:55:33Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp

@@ -244,21 +244,18 @@ void ShenandoahRegionPartitions::establish_mutator_intervals(idx_t mutator_leftm
  _leftmosts_empty[int(ShenandoahFreeSetPartitionId::Mutator)] = mutator_leftmost_empty;
  _rightmosts_empty[int(ShenandoahFreeSetPartitionId::Mutator)] = mutator_rightmost_empty;



We have the heap lock here, so should not need to use atomic store operations. Atomic operations have a performance penalty that I think we want to avoid.

kdnilsen · 2025-07-08T17:57:51Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp

-  _capacity[int(ShenandoahFreeSetPartitionId::Mutator)] = mutator_region_count * _region_size_bytes;
-  _available[int(ShenandoahFreeSetPartitionId::Mutator)] =
-    _capacity[int(ShenandoahFreeSetPartitionId::Mutator)] - _used[int(ShenandoahFreeSetPartitionId::Mutator)];
+  Atomic::store(_region_counts + int(ShenandoahFreeSetPartitionId::Mutator), mutator_region_count);


If we do need to use Atomic operations, would prefer &_region_counts[int(ShenandoahFreeSetPartitionId::Mutator)] notation for the array elements.

kdnilsen · 2025-07-08T18:01:42Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp

 }

 void ShenandoahRegionPartitions::increase_used(ShenandoahFreeSetPartitionId which_partition, size_t bytes) {
-  shenandoah_assert_heaplocked();


Are you now calling this directly from the CAS allocators? So you want to not have to assert heap locked and that is why we make the used accounting atomic?

My preference would be to avoid this need by counting the entire region as used at the time the region becomes directly allocatable.

kdnilsen · 2025-07-08T18:26:37Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp

@@ -621,18 +612,31 @@ void ShenandoahRegionPartitions::assert_bounds() {
      {
        size_t capacity = _free_set->alloc_capacity(i);
        bool is_empty = (capacity == _region_size_bytes);
-        assert(capacity > 0, "free regions must have allocation capacity");
+        // TODO remove assert, not possible to pass when allow mutator to allocate w/o lock.


Probably the preferred approach here is to "pre-retire" regions when they are made directly allocatable. When the region is pre-retired, it is taken out of the partition, so assert_bounds no longer applies to this region.

kdnilsen · 2025-07-08T19:00:10Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp

@@ -78,10 +79,9 @@ class ShenandoahRegionPartitions {
  // are denoted in bytes.  Note that some regions that had been assigned to a particular partition at rebuild time
  // may have been retired following the rebuild.  The tallies for these regions are still reflected in _capacity[p]
  // and _used[p], even though the region may have been removed from the free set.


Prefer not to make these volatile, as that imposes a compiler overhead.

kdnilsen · 2025-07-09T17:25:47Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp

-           _available[int(which_partition)], _capacity[int(which_partition)], _used[int(which_partition)],
-           partition_membership_name(ssize_t(which_partition)));
-    return _available[int(which_partition)];
+    return capacity_of(which_partition) - used_by(which_partition);
  }

  // Return available_in assuming caller does not hold the heap lock.  In production builds, available is
  // returned without acquiring the lock.  In debug builds, the global heap lock is acquired in order to
  // enforce a consistency assert.
  inline size_t available_in_not_locked(ShenandoahFreeSetPartitionId which_partition) const {


These changes are beyond the scope of planned topic. I think we need to consider them more carefully. Would prefer not to mix the two. (and I personally believe the original implementation has better performance, but feel free to prove me wrong.)

kdnilsen · 2025-07-09T17:29:42Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp

+}
+
+template<bool IS_TLAB>
+HeapWord* ShenandoahFreeSet::par_allocate_single_for_mutator(ShenandoahAllocRequest &req, bool &in_new_region) {


Not clear to me what prefix par_ represents. Parallel allocate (without lock?)

kdnilsen · 2025-07-09T17:37:00Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp

+    uint count = 0u;
+    for (uint i = 0u; i < max_probes; i++) {
+      ShenandoahHeapRegion** shared_region_address = _directly_allocatable_regions + idx;
+      ShenandoahHeapRegion* r = Atomic::load_acquire(shared_region_address);


This code has a lot more synchronization overhead than what is required for CAS allocations. load_acquire() forces a memory fence. All writes performed by other threads before the store_release() must be visible to this thread upon return from load_acquire. I would like to see some documentation that describes the coherency model that we assume/require here. Can we write this up as a comment in the header file?

Personal preference: I think there are many situations where we get better performance if we allow ourselves to see slightly old data, and we can argue that the slightly old data is "harmless". For example, if some other thread replaces the directly_allocatable_region[N] while we're attempting to allocate from directly_allocatable_region[N], we might attempt to allocate from the original region and fail. That's harmless. We'll just retry at the next probe point. If multiple probes fail to allocate, we'll take the synchronization lock and everything will be resolved there. The accumulation of atomic volatile access has a big impact on performance. I've measured this in previous experiments. You can do the same.

kdnilsen · 2025-07-09T17:44:10Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp

+      return allocate_for_mutator(req, in_new_region);
+    }
+    // If any of the 3 consecutive directly allocatable regions is ready for retire and replacement,
+    // grab heap lock try to retire all ready-to-retire shared regions.


Would be preferable to allow this thread to retire all ready-to-retire regions in the directly allocatable set (not just the three that I know about) while it holds the heap lock. We do not necessarily need to keep a separate per-thread representation of ready-to-retire shared regions. This is a very rare event. Just iterate through the 13 (or whatever) regions in the directly allocatable set and ask for each whether end-top is less than min plab size.

kdnilsen · 2025-07-09T17:53:15Z

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp

@@ -287,6 +271,28 @@ class ShenandoahRegionPartitions {
  void assert_bounds() NOT_DEBUG_RETURN;
 };

+#define DIRECTLY_ALLOCATABLE_REGION_UNKNOWN_AFFINITY ((Thread*)-1)


Out of time to dive deep into this right now. Wonder if it makes sense to randomly generate a hash for each thread and store this into a thread-local field. Might provide "randomness" and locality.

pengxiaolong and others added 30 commits June 24, 2025 14:42

Add allocate_atomic using CAS to ShenandoahHeapRegion

a063a1c

Duplicate Z's CPUAffinity in gc shared

66f3919

Touch up

90f21c7

cas_alloc

cd19779

CAS allocation for mutators

5a6bc1c

Update allocation bias

2da4821

Humongous allocation and GC shall not use regions reserved for direct…

5d0d37f

… allocation

Bug fix

c7ef2ec

Bug fix

8237eb6

Merge branch 'openjdk:master' into cas-alloc

854ba37

increase_used needs to be called with heap lock

11da608

Fix errors under race conditions

60e75f2

Fixes

d3cebfc

Fix humongous allocation failure

4caa801

Fix more asserts

64015b3

Merge branch 'openjdk:master' into cas-alloc

b9c9926

Fix build error

94e538c

Remove use of heap lock when update used

37cee1f

Adjust alloc logic

977bebf

Fix build error

4faf618

More refactors

d509856

Add todo comments

970f3dd

Revert "Duplicate Z's CPUAffinity in gc shared"

bc5e72a

This reverts commit 66f3919.

Steal alloc from other shared regions

103e42f

Use current thread id for hash

2f5d818

Fix build error for Windows

6aa2dba

Not reserve a region if it is ready for promotion

ce5616c

Only reserve empty region for direct allocation, also take the chance…

d1d71bc

… to allocate object under lock from the non-empty region with enough capacity

reserve region when non-empty region has enough capacity

96db619

touch up

d4dcb28

pengxiaolong added 4 commits July 7, 2025 08:54

Fix improper order when release a region from direct allocation

fccbd0d

Fix improper order when release a region from direct allocation

3452995

Fix improper order when release a region from direct allocation

c93dc01

Fix a bug

3e80fdc

openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Jul 7, 2025

pengxiaolong added 20 commits July 7, 2025 13:55

Not update allocation bias

b3d3592

Add CPU afinity support and use CPU process id instead thread id for …

9340e6e

…CAS allocation

Fix wrong include

1557472

Fix wrong include

926462f

Use random to decide the start index where mutator starts allocating …

c640e68

…with CAS

Delete ShenandoahCPU

ca04034

Comments to explain ShenandoahDirectlyAllocatableRegionAffinity

f4c8e55

Use PaddedArray to store directly allocatable regions

5338346

Stop using allocate_for_mutator in par_allocate_single_for_mutator

3ef8b86

Fix assert

c9b7c55

Bug fix

c88bccc

Bug fix

fa3e02a

Retire if a region has less than PLAB::min_size() free space

ef512c5

Always release shared region if it doesn't have enough space for the …

e5224a3

…allocation req

Add method retire_region_when_eligible

4d8cc7e

Fix build

554d937

Bug fix

5927bac

retire region from mutator partition more aggressively

00b976e

More optimizations

c26ee74

Fix assert

01460bd

kdnilsen reviewed Jul 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation #26171

8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation #26171

Uh oh!

pengxiaolong commented Jul 7, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Jul 7, 2025

Uh oh!

openjdk bot commented Jul 7, 2025

Uh oh!

openjdk bot commented Jul 7, 2025

Uh oh!

kdnilsen left a comment

Uh oh!

kdnilsen Jul 8, 2025

Uh oh!

kdnilsen Jul 8, 2025

Uh oh!

kdnilsen Jul 8, 2025

Uh oh!

kdnilsen Jul 8, 2025

Uh oh!

kdnilsen Jul 8, 2025

Uh oh!

kdnilsen Jul 9, 2025

Uh oh!

kdnilsen Jul 9, 2025

Uh oh!

kdnilsen Jul 9, 2025

Uh oh!

kdnilsen Jul 9, 2025

Uh oh!

kdnilsen Jul 9, 2025

Uh oh!

Uh oh!

		@@ -244,21 +244,18 @@ void ShenandoahRegionPartitions::establish_mutator_intervals(idx_t mutator_leftm
		_leftmosts_empty[int(ShenandoahFreeSetPartitionId::Mutator)] = mutator_leftmost_empty;
		_rightmosts_empty[int(ShenandoahFreeSetPartitionId::Mutator)] = mutator_rightmost_empty;

8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation #26171

Are you sure you want to change the base?

8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation #26171

Uh oh!

Conversation

pengxiaolong commented Jul 7, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Jul 7, 2025

Uh oh!

openjdk bot commented Jul 7, 2025

Uh oh!

openjdk bot commented Jul 7, 2025

Uh oh!

kdnilsen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pengxiaolong commented Jul 7, 2025 •

edited by openjdk bot

Loading